πΊοΈ SPRING AI β COMPLETE COMPREHENSIVE ROADMAP
From Beginner to Advanced | 2025β2026 Edition
1. WHAT IS SPRING AI?
Spring AI is an application framework for AI Engineering built on top of the Spring ecosystem. It is the Java/Spring answer to Python's LangChain and LlamaIndex β designed to make enterprise-grade AI application development accessible to the world's largest base of Java developers.
1.1 Core Mission
- Apply Spring ecosystem principles (portability, modularity, POJO-based design) to the AI domain.
- Connect enterprise Data and APIs with AI Models through a clean, unified API.
- Enable Java developers to build production-ready AI applications without switching languages.
1.2 Working Principle
Spring AI works on a Provider-Abstraction-Client model:
- AI Provider Layer β OpenAI, Anthropic, Azure OpenAI, Google Gemini, Amazon Bedrock, Ollama, Mistral, etc.
- Spring AI Abstraction Layer β Unified interfaces: ChatModel, EmbeddingModel, ImageModel, etc.
- Application Layer β Your Spring Boot application using ChatClient, VectorStore, Advisors, Tools, etc.
The framework auto-configures AI model clients via Spring Boot starters. Developers interact with provider-agnostic interfaces, allowing seamless provider switching without business logic rewrites.
1.3 Key Design Principles
- Portability: Switch AI providers without changing application code.
- Modularity: Use only the AI features you need via starter dependencies.
- POJO-based: Map AI outputs directly to Java objects (Structured Output).
- Production-ready: Built-in observability, evaluation, memory, and ETL pipelines.
2. ARCHITECTURE DEEP-DIVE
2.1 High-Level Architecture Layers
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your Spring Boot Application β
β (REST APIs, Services, Repositories, Controllers) β
ββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β Spring AI Core Layer β
β ChatClient β Advisors API β Tool Calling β
β VectorStore β RAG Pipeline β Memory β
β ETL Framework β Evaluation β Observability β
ββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β Model Abstraction Interfaces β
β ChatModel β EmbeddingModel β ImageModel β
β AudioModel β ModerationModel β StreamingChatModelβ
ββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββ
β
ββββββββββββββββββββΌββββββββββββββββββββββββββββββββ
β AI Provider Integrations β
β OpenAI β Anthropic β Azure OpenAI β Google β
β Amazon Bedrock β Ollama β Mistral β Groq β
β Hugging Face β Perplexity β ZhiPu β Moonshot β
ββββββββββββββββββββββββββββββββββββββββββββββββββββ2.2 Core Components
ChatClient API
- The primary entry point for AI interactions.
- Fluent builder-style API (similar to WebClient/RestClient).
- Supports system prompts, user prompts, conversation memory, and advisors.
- Supports synchronous and reactive (streaming) responses.
ChatModel Interface
- Provider-agnostic abstraction over any LLM.
- Implementations: OpenAiChatModel, AnthropicChatModel, AzureOpenAiChatModel, OllamaChatModel, etc.
- Returns ChatResponse containing Generation objects.
EmbeddingModel Interface
- Converts text to vector embeddings.
- Used by RAG pipelines to semantically index and search documents.
- Implementations for OpenAI, Azure, Ollama, HuggingFace, etc.
VectorStore Interface
- Stores and retrieves vector embeddings with semantic similarity search.
- Supported Stores: PGVector, Chroma, Milvus, Redis, Pinecone, Weaviate, Qdrant, MongoDB Atlas, Neo4j, Oracle, Azure AI Search, OpenSearch, Apache Cassandra, Elasticsearch, GemFire.
Advisors API
- Encapsulates recurring patterns: RAG, memory, logging, safety guardrails.
- Key Advisors:
-
- QuestionAnswerAdvisor: Injects retrieved context into prompts (RAG).
- MessageChatMemoryAdvisor: Manages conversation history.
- PromptChatMemoryAdvisor: Injects memory into prompts.
- SafeGuardAdvisor: Blocks sensitive content.
- SimpleLoggerAdvisor: Logs requests and responses.
- ReReadingAdvisor: Implements Re-reading prompt technique.
Tool Calling / Function Calling
- Enables AI models to invoke Java methods at runtime.
- @Tool annotation marks Spring beans as callable tools.
- Spring AI auto-generates JSON schema from method signatures.
- Supports tool resolution, error handling, and result injection.
Document ETL Pipeline
- Extract: DocumentReader implementations (PDF, Word, HTML, CSV, JSON, YouTube, GitHub, S3, etc.)
- Transform: TextSplitter, MetadataEnricher, TokenCountEstimator, etc.
- Load: VectorStore with embedding generation.
RAG (Retrieval-Augmented Generation)
- Simple RAG: QuestionAnswerAdvisor with a VectorStore.
- Modular RAG: Fully customizable pipeline β Query Analysis β Retrieval β Post-Retrieval β Augmentation β Generation.
- Components: QueryTransformer, DocumentRetriever, DocumentPostProcessor, ContextualQueryAugmenter.
Memory Management
- InMemoryChatMemoryRepository: Session-based in-memory storage.
- JdbcChatMemoryRepository: Database-backed conversation persistence.
- RedisChatMemoryRepository: Redis-backed memory (added in 2.0-M1).
- CassandraChatMemoryRepository: Cassandra-backed memory.
Model Context Protocol (MCP)
- Standard protocol connecting AI models to external tools and data sources.
- Spring AI provides MCP Client and MCP Server implementations.
- @Tool annotation automatically exposes Spring beans as MCP-compliant tools.
- Supports stdio and HTTP-based SSE transports.
- OAuth2-secured MCP server connections.
Observability
- Built on Micrometer for metrics and tracing.
- Tracks token usage, latency, model parameters, and request metadata.
- Integrates with Zipkin, Jaeger, Prometheus, Grafana, and OpenTelemetry.
Structured Output
- Maps LLM text responses to Java objects using BeanOutputConverter, MapOutputConverter.
- Uses Jackson for JSON marshalling.
- Enables type-safe AI responses as POJOs.
AI Model Evaluation
- EvaluationRequest / EvaluationResponse model.
- RelevancyEvaluator: Checks if response is relevant to the query.
- FactCheckingEvaluator: Validates factual correctness.
- Used for automated quality assurance of AI outputs.
2.3 Spring AI Agents Architecture
Agents combine Planning + Memory + Actions to solve user tasks autonomously.
Workflow Agents (Predictable)
- LLMs and tools orchestrated through predefined, prescriptive paths.
- Better for well-defined, repeatable tasks.
- Components: Chain of tools, conditional branching, retry logic.
Autonomous Agents (Flexible)
- LLMs decide which tools to use and in what order.
- Better for open-ended, exploratory tasks.
- Components: ReAct (Reason + Act) loop, tool pool, termination conditions.
Agent Patterns
- ReAct Agent: Thought β Action β Observation loop.
- Plan-and-Execute Agent: First plan all steps, then execute.
- Reflection Agent: Self-evaluates and re-runs if response quality is low.
- Multi-Agent: Multiple specialized agents collaborating via MCP.
3. HARDWARE & INFRASTRUCTURE REQUIREMENTS
3.1 Development Environment
Minimum Requirements (Local Development with Cloud APIs)
- CPU: 4-core modern processor (Intel i5 / AMD Ryzen 5 or better)
- RAM: 16 GB (8 GB minimum, 32 GB recommended for large projects)
- Storage: 50 GB SSD free space
- OS: Windows 10+, macOS 12+, Ubuntu 20.04+
- Java: JDK 17 minimum (JDK 21+ recommended for virtual threads)
- Build Tool: Maven 3.9+ or Gradle 8+
- IDE: IntelliJ IDEA (recommended), VS Code + Java Extension Pack, Eclipse
For Local Model Inference (Ollama)
- CPU: 8-core modern processor (Apple M-series or AMD Ryzen 7+)
- RAM: 32 GB minimum (64 GB for large models like Llama 3.1 70B)
- GPU: NVIDIA RTX 3060+ (12 GB VRAM) for GPU acceleration
- RTX 3090 / RTX 4090 (24 GB VRAM): for 70B models
- Apple M2 Ultra / M3 Max: unified memory handles 70B models efficiently
- Storage: 100β500 GB SSD (models range from 4 GB to 140 GB)
3.2 Production Infrastructure
Cloud APIβBased Deployment (Recommended for Most Teams)
- AWS EC2 / Azure VM / GCP Compute: t3.medium to t3.xlarge (2β4 vCPU, 4β16 GB RAM)
- Kubernetes: Recommended for scaling β HPA based on token usage metrics
- Docker: Spring Boot containerized with Docker
- Databases: PostgreSQL (pgvector), Redis, MongoDB Atlas for memory and vector storage
Self-Hosted LLM Inference (Enterprise)
- GPU Servers: NVIDIA A100 (80 GB), H100 (80 GB), or RTX A6000 (48 GB)
- Memory: 256β512 GB RAM for large inference clusters
- Network: 10 Gbps+ internal networking for distributed inference
- Software: vLLM, Ollama, llama.cpp, TGI (Text Generation Inference), Triton Inference Server
Vector Database Infrastructure
- PGVector: Extensions on existing PostgreSQL β minimal additional hardware
- Pinecone / Weaviate Cloud: SaaS β no hardware management
- Chroma: Lightweight, good for development; needs 8 GB+ RAM in production
- Milvus Distributed: Kubernetes cluster β requires etcd + MinIO + multiple nodes
3.3 Networking & Security Requirements
- Outbound HTTPS (443): Required for cloud AI API calls (OpenAI, Anthropic, Azure, AWS)
- API Keys: Stored in environment variables or secrets management (HashiCorp Vault, AWS Secrets Manager)
- mTLS: For secure MCP server connections
- OAuth2: For MCP server authentication (Spring AI 1.1+)
- Rate Limiting: Configure per API provider's rate limits (tokens per minute, requests per minute)
- VPC/Private Networking: For enterprise deployments connecting on-premises databases
4. STRUCTURED LEARNING PATH
PHASE 0 β FOUNDATIONS (Weeks 1β3)
4.0.1 Java & Spring Boot Prerequisites
- Java 17+ features: Records, Sealed Classes, Pattern Matching, Text Blocks, Virtual Threads
- Spring Boot 3.x: Auto-configuration, Starters, Application Properties, Profiles
- Spring Web MVC: @RestController, @GetMapping, @PostMapping, ResponseEntity
- Spring WebFlux: Reactive Streams, Mono, Flux (for streaming AI responses)
- Spring Data JPA: Repositories, Entities, JPQL (for memory persistence)
- Maven / Gradle: Dependency management, BOM imports, build lifecycle
- Docker Basics: Containers, images, docker-compose for local services
4.0.2 AI/ML Concepts for Developers
- What are Large Language Models (LLMs)? Transformers, tokens, context windows
- Prompt Engineering: System prompts, user prompts, few-shot examples, chain-of-thought
- Temperature & Sampling: What temperature, top-p, and top-k mean for output quality
- Embeddings: What are vector embeddings? Semantic similarity, cosine distance
- Tokens & Pricing: How LLM APIs charge per token; input vs output tokens
- Hallucinations: What they are, why they happen, how to mitigate them
- RAG Basics: Why retrieval-augmented generation reduces hallucinations
- Fine-Tuning vs Prompting: When to use each approach
4.0.3 Spring AI Introduction
- What Spring AI is and what problems it solves
- Comparison with LangChain (Python) and LangChain4j (Java)
- Spring AI project structure, GitHub repository, documentation
- Spring Initializr: Creating a Spring AI project at start.spring.io
- Adding Spring AI BOM and starter dependencies
- Basic project setup: API key configuration in application.yaml
PHASE 1 β CORE FUNDAMENTALS (Weeks 4β7)
4.1.1 ChatClient & ChatModel
- ChatClient.Builder auto-configuration
- Creating a simple chat service: prompt β response
- System prompt configuration: setting AI persona and behavior
- User prompt construction: PromptTemplate, dynamic variable substitution
- Synchronous chat: call().content()
- Streaming chat: stream().content() returning Flux
- Response metadata: token usage, model name, finish reason
- ChatOptions: temperature, maxTokens, topP, stop sequences per-request
4.1.2 Prompt Engineering in Spring AI
- PromptTemplate: Parameterized prompts with {variable} substitution
- SystemPromptTemplate: Configuring AI behavior and persona
- Few-shot prompting: Providing examples in prompts
- Chain-of-thought prompting: Getting AI to reason step-by-step
- Output format instructions: JSON, XML, structured formats
- Loading prompts from classpath resources (.st files)
- Message types: SystemMessage, UserMessage, AssistantMessage, ToolResponseMessage
4.1.3 Structured Output
- BeanOutputConverter: Map LLM output to Java POJOs
- MapOutputConverter: LLM output to Map
- ListOutputConverter: LLM output to List
- Using @JsonProperty and @JsonDescription on output POJOs
- Error handling for malformed LLM output
- Combining structured output with validation (Bean Validation API)
4.1.4 AI Model Providers Configuration
- OpenAI: GPT-4o, GPT-4o-mini, GPT-5, GPT-5-mini configuration
- Anthropic: Claude Opus, Sonnet, Haiku configuration
- Azure OpenAI: Deployment names, endpoints, API versions
- Ollama: Local model setup, model pulling, endpoint configuration
- Google Vertex AI Gemini: Project, location, model configuration
- Amazon Bedrock: AWS credentials, region, model IDs
- Mistral AI: API key, model selection
- Groq: Ultra-fast inference configuration
PHASE 2 β EMBEDDINGS & VECTOR STORES (Weeks 8β11)
4.2.1 Embeddings
- What are embeddings and why they matter for AI applications
- EmbeddingModel interface and call() method
- OpenAI Embeddings: text-embedding-3-small vs text-embedding-3-large
- Dimensionality: 1536-dim vs 3072-dim vectors
- Batch embedding: Embed multiple texts efficiently
- EmbeddingRequest / EmbeddingResponse model
- Cosine similarity: How to compare embedding vectors manually
- Use cases: Semantic search, deduplication, clustering, classification
4.2.2 Vector Stores
- VectorStore interface: add(), similaritySearch(), delete()
- SearchRequest: query, topK, similarityThreshold, metadata filters
- SimpleVectorStore: In-memory, for development and testing
- PGVector Setup: PostgreSQL with pgvector extension, Spring Data integration
- Redis Vector Store: Configuration and metadata filtering
- Chroma: Docker setup, collection management
- Milvus: Cloud and self-hosted configuration
- Pinecone: Cloud vector database integration
- Weaviate: Schema-less vector store with hybrid search
- MongoDB Atlas Vector Search: Atlas cluster configuration
- Metadata Filtering: Type-safe metadata filter expressions
4.2.3 Document Processing (ETL Pipeline)
- DocumentReader implementations:
-
- TextReader, JsonReader, CsvReader
- PdfDocumentReader (Apache PDFBox, Tika)
- TikaDocumentReader: Handles 1000+ file formats
- WordDocumentReader, PowerPointDocumentReader
- HtmlDocumentReader, MarkdownDocumentReader
- GithubDocumentReader: Reading from repositories
- YouTubeDocumentReader: Transcript extraction
- S3DocumentReader, AzureBlobStorageReader, GoogleCloudStorageReader
- KafkaDocumentReader, MongoDocumentReader, JdbcDocumentReader
- TextSplitter implementations:
-
- TokenTextSplitter: Split by token count (recommended)
- CharacterTextSplitter: Split by character count
- SentenceTransformersTokenTextSplitter: Semantic sentence boundary splitting
- RecursiveCharacterTextSplitter: Hierarchical splitting strategy
- Document Transformers:
-
- MetadataEnricher: Add custom metadata fields
- SummaryMetadataEnricher: Generate summaries using LLM and store as metadata
- KeywordMetadataEnricher: Extract keywords and store as metadata
- ContentFormatTransformer: Normalize content formats
PHASE 3 β RAG & ADVISORS (Weeks 12β16)
4.3.1 Basic RAG with QuestionAnswerAdvisor
- QuestionAnswerAdvisor: Automatic context injection from VectorStore
- Configuring retrieval: topK, similarity threshold, metadata filters
- Custom prompt templates: DEFAULT_USER_TEXT_ADVISE and DEFAULT_SYSTEM_TEXT_ADVISE
- Dynamic metadata filter expressions: runtime filter construction
- Combining multiple vector stores in RAG queries
4.3.2 Modular RAG Architecture
Modular RAG pipeline stages:
- Query Analysis & Transformation
- Document Retrieval
- Post-Retrieval Processing
- Augmentation
- Generation
- Query Transformation:
- ReWriteQueryTransformer: Rewrites user queries for better retrieval
- TranslationQueryTransformer: Translates queries to match document language
- MultiQueryExpander: Expands one query into multiple for broader retrieval
- CompressionQueryTransformer: Compresses context + query for follow-up questions
- StepBackQueryTransformer: Generates more abstract "step back" queries
- Document Retrieval:
- VectorStoreDocumentRetriever: Semantic vector-based retrieval
- BM25/Keyword Retrieval: Lexical search integration
- Hybrid Retrieval: Combining vector + keyword (RRF fusion)
- Post-Retrieval Processing:
- DocumentRanker: Re-rank documents by relevance (Cohere Rerank integration)
- DuplicateContentFilter: Removes semantically duplicate documents
- TokenBudgetContentFilter: Limits context to a token budget
- ConcatenationDocumentJoiner: Merges documents from multiple retrievers
- Augmentation:
- ContextualQueryAugmenter: Injects retrieved context into the prompt
- RetrievalAugmentationAdvisor: Wires together modular RAG pipeline
4.3.3 Chat Memory & Conversation History
- ChatMemory interface: add(), get(), clear()
- InMemoryChatMemoryRepository: Development use
- JdbcChatMemoryRepository: Persistent conversation history
- RedisChatMemoryRepository: Distributed memory
- MessageChatMemoryAdvisor: Adds memory to ChatClient conversations
- PromptChatMemoryAdvisor: Injects memory into system prompt
- Memory window size: Configuring how many past messages to include
4.3.4 Custom Advisors
- CallAroundAdvisor interface for synchronous advisors
- StreamAroundAdvisor interface for reactive advisors
- AdvisedRequest and AdvisedResponse models
- Advisor ordering with getOrder()
- Building a custom safety guardrail advisor
- Building a custom caching advisor
- Building a custom logging and audit advisor
- Advisor chains and composition
PHASE 4 β TOOL CALLING & AGENTS (Weeks 17β22)
4.4.1 Tool Calling Fundamentals
- @Tool annotation: Exposing Java methods as AI tools
- @ToolParam annotation: Describing tool parameters for the AI
- Tool description: Writing clear descriptions that guide AI tool selection
- Return value handling: String, POJO, void tools
- ToolContext: Passing application context to tools at runtime
- Tool error handling: Exception translation and error messages
4.4.2 Built-in Tool Integrations
- WebSearchTool: Real-time web search
- WikipediaTool: Wikipedia lookup
- WeatherTool: Weather data retrieval
- CalendarTool: Calendar integration
- DallETool: Image generation from within a conversation
4.4.3 Spring AI Agents
- Agent interface: Agent.call() and Agent.stream()
- ReAct Agent (Reasoning + Acting):
- Thought β Action β Observation loop
- Tool selection reasoning
- Termination conditions
- Maximum iterations configuration
- Plan-and-Execute Agent:
- Planning phase: Decomposing complex tasks into steps
- Execution phase: Executing each step with appropriate tools
- Replanning: Handling failed steps
- Chat Agent:
- Stateful conversation with tool access
- Memory integration
- Multi-turn reasoning
4.4.4 Model Context Protocol (MCP)
- MCP Client: Connecting to external MCP servers
- MCP Server: Exposing Spring application as an MCP server
- spring-ai-starter-mcp-client: Adding MCP client capability
- spring-ai-starter-mcp-server: Exposing Spring beans as MCP server
- Tool discovery: Listing available tools from MCP servers
- Resource access: Files, databases, APIs via MCP resources
- Multi-transport: stdio transport for local tools, SSE for remote
- OAuth2 MCP authentication (Spring AI 1.1+)
- Protocol versioning: 2024-11-05 and 2025-03-26 versions
PHASE 5 β MULTIMODAL & ADVANCED MODELS (Weeks 23β27)
4.5.1 Image Generation
- ImageModel interface and ImagePrompt
- OpenAI DALL-E 3: size, quality, style options
- Stability AI: Image generation with style prompts
- Azure OpenAI DALL-E: Azure-hosted image generation
- ImageResponse and handling base64 / URL responses
- Batch image generation
4.5.2 Multimodal Chat (Vision)
- Passing images to chat models: UserMessage with Media
- Media class: Data URI, URL, file path
- Supported vision models: GPT-4o, Claude 3.x, Gemini Pro Vision
- Document analysis: PDF/image document understanding
- Video frame analysis (Gemini)
- Use cases: Receipt parsing, diagram explanation, chart analysis
4.5.3 Audio
- AudioTranscriptionModel: Speech-to-text
- OpenAI Whisper integration
- AudioSpeechModel: Text-to-speech
- OpenAI TTS models: tts-1, tts-1-hd
- Voice options: alloy, echo, fable, onyx, nova, shimmer
- Streaming audio responses
4.5.4 Moderation
- ModerationModel: Content safety classification
- OpenAI Moderation API integration
- Custom moderation pipelines with Advisor pattern
- Combining moderation with SafeGuardAdvisor
PHASE 6 β PRODUCTION, OBSERVABILITY & ADVANCED PATTERNS (Weeks 28β36)
4.6.1 Observability & Monitoring
- Micrometer integration: Automatic metrics on AI calls
- Key metrics: token.usage, latency, model, operation.name
- Tracing: Distributed tracing with Zipkin/Jaeger
- Prometheus + Grafana: AI dashboard setup
- OpenTelemetry: Vendor-neutral observability
- Cost tracking: Monitor token spend per endpoint/user
- Spring Boot Actuator: Health checks for AI model connectivity
4.6.2 AI Model Evaluation Framework
- EvaluationRequest / EvaluationResponse model
- RelevancyEvaluator: Is the answer relevant to the question?
- FactCheckingEvaluator: Is the answer factually grounded in context?
- Custom evaluators: Building domain-specific evaluators
- Automated test suites for RAG pipelines
- Regression testing: Detecting quality degradation on code changes
- A/B testing: Comparing two AI configurations
4.6.3 Security & Safety
- API key management: Environment variables, Spring Cloud Vault
- Rate limiting AI endpoints: Bucket4j, Resilience4j integration
- Input sanitization: Preventing prompt injection attacks
- Output filtering: SafeGuardAdvisor, custom content filters
- PII redaction: Removing sensitive data from prompts and logs
- Audit logging: Full request/response audit trails
- GDPR compliance: Data retention policies for conversation memory
4.6.4 Performance Optimization
- Caching: Spring Cache on embedding generation results
- Async processing: @Async for non-blocking AI calls
- Connection pooling: HTTP client configuration for AI APIs
- Streaming responses: Reactive endpoint delivery
- Batch embedding: Processing documents in batches
- Model selection: Choosing the right model tier for the task
- Context window management: Summarization for long conversations
- Token optimization: Prompt compression techniques
4.6.5 Spring Boot Native Image (GraalVM)
- AOT (Ahead-of-Time) compilation improvements in Spring AI 2.0
- GraalVM native image support for Spring AI apps
- Performance benefits: Sub-second startup, reduced memory footprint
- Limitations and workarounds for reflection-heavy AI operations
4.6.6 Testing Spring AI Applications
- MockChatModel: Mock AI responses in unit tests
- TestcontainersOllamaService: Integration tests with Ollama
- VectorStore testing: In-memory vector stores for tests
- Advisor testing: Verifying advisor chain behavior
- Evaluation-driven testing: Using AI evaluators in test assertions
- WireMock: Mocking external AI API endpoints
5. ALGORITHMS, TECHNIQUES & TOOLS
5.1 Core AI Algorithms Used
Embedding & Similarity
- Cosine Similarity: Primary metric for semantic search in vector stores
- Dot Product Similarity: Alternative to cosine for normalized embeddings
- Euclidean Distance (L2): Distance-based similarity
- ANN (Approximate Nearest Neighbor) Search: HNSW algorithm in vector databases
- BM25: TF-IDF based lexical search for hybrid retrieval
Retrieval & Ranking
- RAG (Retrieval-Augmented Generation): Ground AI responses in retrieved context
- HyDE (Hypothetical Document Embeddings): Generate hypothetical answers to improve retrieval
- Multi-Query Retrieval: Expand user query into multiple variants for broader recall
- Step-Back Prompting: Generate abstract questions for better concept retrieval
- RRF (Reciprocal Rank Fusion): Combine rankings from multiple retrievers
- Contextual Compression: Compress retrieved documents to relevant snippets only
- Cross-Encoder Reranking: Neural reranking of retrieved documents (Cohere Rerank)
Prompt Engineering Techniques
- Zero-shot: Direct question without examples
- Few-shot: Provide examples to guide output format
- Chain-of-Thought (CoT): "Let's think step by step"
- Self-Consistency: Generate multiple answers, take majority vote
- Reflection / Self-Critique: AI evaluates and refines its own output
- Role Prompting: "You are an expert in X"
- Structured Output: "Respond only in JSON format"
- Tree-of-Thought (ToT): Explore multiple reasoning paths
Agent Reasoning Patterns
- ReAct (Reasoning + Acting): Interleave thought and action steps
- Plan-and-Execute: Explicit planning phase before execution
- Reflection: Loop where agent critiques its own output
- Multi-Agent Debate: Multiple agents argue to reach better conclusions
- Tool Augmented Generation: Invoking external tools to ground responses
5.2 Spring AIβSpecific Techniques
Context Management
- Sliding Window Memory: Keep last N messages in context
- Summary Memory: Summarize old messages to save tokens
- Entity Memory: Extract and store key entities from conversations
- Semantic Chunking: Chunk documents at semantic boundaries
- Parent-Child Chunking: Store small chunks for retrieval but pass large parent chunks to LLM
Optimization Techniques
- Prompt Caching: Cache frequently used system prompts (Claude's prompt caching)
- Speculative Decoding: Faster inference using draft models
- Quantization Awareness: Choosing right model precision (FP16 vs INT4) in Ollama
- Streaming: Deliver AI tokens to client as they are generated (reduces TTFB)
- Batching: Process multiple embedding requests together
5.3 Major Tools & Technologies
Spring AI Ecosystem
- spring-ai-bom: Bill of Materials for dependency management
- spring-ai-openai-spring-boot-starter
- spring-ai-anthropic-spring-boot-starter
- spring-ai-ollama-spring-boot-starter
- spring-ai-vertex-ai-gemini-spring-boot-starter
- spring-ai-bedrock-converse-spring-boot-starter
- spring-ai-pgvector-store-spring-boot-starter
- spring-ai-redis-store-spring-boot-starter
- spring-ai-chroma-store-spring-boot-starter
- spring-ai-milvus-store-spring-boot-starter
- spring-ai-starter-mcp-server
- spring-ai-starter-mcp-client
- spring-ai-tika-document-reader
- spring-ai-pdf-document-reader
AI Model Providers
- OpenAI API: GPT-4o, GPT-5, DALL-E 3, Whisper, TTS, Embeddings
- Anthropic API: Claude 3.5/3.7 Sonnet, Claude 3 Opus, Claude 3 Haiku
- Google Vertex AI: Gemini 1.5 Pro, Gemini 2.0, Gemini Ultra
- Amazon Bedrock: Claude, Titan, Llama, Mistral on AWS
- Azure OpenAI: GPT-4o on Microsoft Azure
- Ollama: Local LLM inference (Llama 3.x, Phi-4, Mistral, Gemma, Qwen)
- Groq: Ultra-low latency inference
- Mistral AI: Mistral Large, Mixtral models
- Hugging Face: Open-source model inference
Vector Databases
- PGVector: PostgreSQL extension (best for teams already on Postgres)
- Chroma: Lightweight, developer-friendly
- Milvus: High-performance, cloud-native
- Pinecone: Fully managed cloud vector database
- Weaviate: Multi-modal, hybrid search
- Qdrant: Rust-based, high performance
- Redis Vector: If already using Redis
- MongoDB Atlas: If already using MongoDB
- Neo4j Vector: If using graph databases
Supporting Infrastructure
- Docker / Docker Compose: Local service orchestration
- Kubernetes: Production container orchestration
- Testcontainers: Integration testing with real containers
- Prometheus + Grafana: Metrics visualization
- Zipkin / Jaeger: Distributed tracing
- HashiCorp Vault: Secret management for API keys
- PostgreSQL: Conversation memory persistence + PGVector
- Redis: Session memory, caching, rate limiting
- Apache Kafka: Event streaming for AI pipelines
- Spring Cloud Config: Centralized AI configuration management
Build & Developer Tools
- IntelliJ IDEA + Spring Boot Plugin
- start.spring.io: Project scaffolding
- Spring CLI: Rapid project generation
- OpenRewrite: Automated migration between Spring AI versions
- Arconia Spring AI Migrations: Migration recipes for Spring AI upgrades
- Maven 3.9+ / Gradle 8+: Build tools
6. DESIGN & DEVELOPMENT PROCESS
6.1 Forward Design Process (Scratch to Advanced)
Stage 1: Problem Definition & Requirements
- Define the AI use case: Chat, RAG, Agent, or multimodal
- Identify data sources: Documents, databases, APIs
- Select AI provider: Cloud vs local, cost vs capability
- Define quality requirements: Response latency, accuracy, safety
- Map data flow: User β Application β AI β Response
Stage 2: Project Setup
- Go to start.spring.io
- Select: Spring Boot 3.4+, Java 21, Maven/Gradle
- Add dependencies: Spring Web, Spring AI (choose model starter), Spring Data JPA (if needed)
- Configure application.yaml: API keys, model options, vector store connections
- Set up Docker Compose for local services (PGVector, Redis, Chroma)
Stage 3: Core AI Layer Development
- Configure ChatModel bean (auto-configured via starter)
- Build ChatClient with system prompt, memory, and advisors
- Define tool beans with @Tool and @Component
- Implement PromptTemplate for dynamic prompt construction
- Add streaming endpoint for real-time response delivery
Stage 4: Data Ingestion Pipeline
- Choose DocumentReader(s) for data sources
- Configure TextSplitter (TokenTextSplitter with 512-token chunks, 50 overlap)
- Configure EmbeddingModel (OpenAI text-embedding-3-small recommended)
- Configure VectorStore (PGVector for production)
- Build ingestion service: Read β Split β Embed β Store
- Run ingestion pipeline: CLI runner, scheduled job, or event-driven
Stage 5: RAG Query Pipeline
- Add QuestionAnswerAdvisor to ChatClient with VectorStore
- Configure topK and similarityThreshold
- Add metadata filters for document access control
- For advanced RAG: compose modular pipeline components
- Add query transformation if simple retrieval isn't sufficient
Stage 6: API Layer
- Build REST controller with ChatClient injection
- Add streaming endpoint using Flux
- Add ingestion endpoint for document upload
- Add conversation management endpoints (start, continue, clear)
- Add authentication/authorization (Spring Security)
Stage 7: Observability & Evaluation
- Add Micrometer + Prometheus dependencies
- Configure token usage metrics collection
- Build evaluation test suite with RelevancyEvaluator
- Set up Grafana dashboard for AI KPIs
- Add structured logging with trace IDs
Stage 8: Production Hardening
- Add rate limiting per user/API key
- Implement circuit breakers for AI API calls (Resilience4j)
- Add retry logic with exponential backoff
- Configure API key rotation
- Add input/output content filtering
- Add cost alerts and budget limits
6.2 Reverse Engineering Method
Reverse engineering an existing AI application built with Spring AI:
Step 1: Map the Entry Points
- Find all @RestController classes handling AI-related routes
- Identify ChatClient or ChatModel injection points
- Trace the request flow from HTTP endpoint to AI call
Step 2: Understand the Prompt Architecture
- Find all PromptTemplate, @Value loaded prompts, and SystemMessage configurations
- Understand the system persona, instructions, and constraints
- Identify all {variable} substitution points
- Check for multi-turn conversation memory configuration
Step 3: Identify the Retrieval Pipeline
- Find VectorStore beans and their configuration
- Identify QuestionAnswerAdvisor or custom RAG advisors
- Trace document ingestion: DocumentReader β TextSplitter β VectorStore
- Check metadata filtering strategy
Step 4: Map Tool Definitions
- Find all @Tool-annotated methods
- Understand what actions the AI can take in the system
- Identify tool result handling and error scenarios
Step 5: Trace the Advisor Chain
- List all Advisor beans and their order
- Understand what each advisor adds or modifies
- Identify memory, safety, logging, and RAG advisors
Step 6: Identify Configuration
- application.yaml: model provider, model name, temperature, tokens
- Vector store connection: host, port, collection/table name
- Memory store: type and configuration
- Observability: metrics, tracing configuration
Step 7: Reproduce & Modify
- Replicate core functionality in a test environment
- Substitute components with alternatives (e.g., swap OpenAI for Ollama)
- Add or remove advisors to change behavior
- Experiment with different chunking strategies
7. CUTTING-EDGE DEVELOPMENTS (2025β2026)
7.1 Spring AI 1.0 GA (May 2025)
- First production-ready release with stable APIs
- MCP Client and Server GA β connect Spring apps to any MCP tool ecosystem
- Modular RAG pipeline with all components
- Full advisor API with ordering and composition
- Comprehensive vector store support (15+ providers)
- Agent framework: workflow and autonomous agent implementations
7.2 Spring AI 1.1 (Late 2025)
- OAuth2-secured MCP server connections
- Multi-protocol MCP version negotiation (2024-11-05 and 2025-03-26)
- Deep integration with latest MCP Java SDK
- Redis-based chat memory repository
- Enhanced observability hooks
- Additional model provider integrations
7.3 Spring AI 2.0 (2026)
- Built on Spring Boot 4.0 and Spring Framework 7.0
- GraalVM native image AOT compilation improvements (contributed by Netflix/Bedrin)
- Official OpenAI Java SDK native integration
- Kotlin 2.2.x compatibility
- Default model updated to GPT-5-mini
- Removal of default temperature β explicit configuration required
- Testcontainers 2.0 integration
7.4 Industry Trends Influencing Spring AI Roadmap
Agentic AI (2025 is the Year of Agents)
- Multi-agent orchestration frameworks
- Agent memory: episodic, semantic, and procedural memory types
- Agent safety: bounded execution, resource limits, human-in-the-loop
- Computer Use: Agents controlling desktop/browser (Spring AI in Chrome, Excel, PowerPoint)
Model Context Protocol Ecosystem
- MCP becoming the standard for AI tool interoperability
- Hundreds of MCP servers in the community ecosystem
- Spring AI as a first-class MCP citizen (client + server)
Open-Weight Model Surge
- Llama 3.x, Phi-4, Gemma 3, Qwen 2.5, Mistral 24B rivaling closed models
- Ollama + Spring AI enabling full local AI stacks
- Hybrid deployments: local for privacy-sensitive data, cloud for complex reasoning
Multimodal Expansion
- Vision + Text becoming standard for enterprise AI apps
- Audio transcription + synthesis in agent workflows
- PDF and document intelligence as first-class use case
RAG Evolution
- Moving beyond simple RAG to GraphRAG (knowledge graph + vector)
- Agentic RAG: AI decides when and how to retrieve
- Long-context models reducing (but not eliminating) need for retrieval
- Reranking as standard practice for production RAG
8. BUILD IDEAS: BEGINNER TO ADVANCED
π’ BEGINNER LEVEL (Phase 1β2 Skills)
Build 1: AI Personal Assistant API Beginner
Goal: Simple ChatClient with a system prompt defining a helpful assistant persona
Skills: ChatClient, PromptTemplate, streamingBuild 2: AI Text Summarizer Beginner
Goal: Accept long text or URL, return structured summary
Skills: Structured Output, BeanOutputConverter, PromptTemplateBuild 3: Multi-Language Translator Beginner
Goal: Accept text + target language, return translation
Skills: ChatClient, PromptTemplate, Structured OutputBuild 4: Code Reviewer Bot Beginner
Goal: Accept code snippet, return analysis
Skills: ChatClient, Structured Output, prompt engineeringBuild 5: AI FAQ Generator Beginner
Goal: Accept a document, generate a structured FAQ
Skills: ChatClient, Document reading, Structured Outputπ‘ INTERMEDIATE LEVEL (Phase 3β4 Skills)
Build 6: Document Q&A System (RAG) Intermediate
Goal: Upload PDFs/documents, ingest, query with grounded answers
Skills: DocumentReader, TokenTextSplitter, EmbeddingModel, PGVector, QuestionAnswerAdvisorBuild 7: Conversational Customer Support Bot Intermediate
Goal: Multi-turn conversation with memory and RAG
Skills: ChatMemory, QuestionAnswerAdvisor, Tool Calling, MessageChatMemoryAdvisorBuild 8: AI-Powered Code Generation Assistant Intermediate
Goal: Accept user description, generate and review code
Skills: Tool Calling, Multi-step prompting, Structured OutputBuild 9: Research Assistant with Web Search Intermediate
Goal: Agent searches web for current information and summarizes
Skills: Agents, WebSearchTool, ReAct pattern, Tool CallingBuild 10: Multi-Source Knowledge Base Intermediate
Goal: Ingest from multiple sources with metadata filtering
Skills: Multiple DocumentReaders, metadata filtering, modular RAGBuild 11: AI Email Assistant Intermediate
Goal: Connect to email via MCP, classify, summarize, draft responses
Skills: MCP integration, Tool Calling, Structured Outputπ΄ ADVANCED LEVEL (Phase 5β6 Skills)
Build 12: Autonomous Research Agent Advanced
Goal: Given research topic, autonomously search, ingest, cross-reference, report
Skills: Autonomous Agents, dynamic RAG, Tool Calling, multi-step planningBuild 13: Enterprise Document Intelligence Platform Advanced
Goal: Multi-tenant document management with RBAC and analytics
Skills: Multi-tenancy, metadata filtering, Observability, SecurityBuild 14: AI-Powered Data Analysis Platform Advanced
Goal: Natural language to SQL, chart generation, anomaly detection
Skills: MCP, Tool Calling, Structured Output, Agents, multimodal outputBuild 15: Multimodal AI Expense Tracker Advanced
Goal: Upload receipt images β Extract data via vision, categorize, report
Skills: Multimodal (vision), Structured Output, Tool Calling, ChatMemoryBuild 16: AI Code Review & CI/CD Integration Advanced
Goal: GitHub webhook integration, automated PR review with suggestions
Skills: MCP, Agent, Tool Calling, Evaluation frameworkBuild 17: Multi-Agent Legal Document Analyzer Advanced
Goal: Multiple specialized agents extract, identify risks, compare, summarize
Skills: Multi-agent architecture, MCP, RAG, Structured OutputBuild 18: Production AI Platform with Full Observability Advanced
Goal: Complete API gateway with auth, rate limiting, tracing, A/B testing
Skills: All Phase 6 skills, full production stack9. FLOW DIAGRAMS & REFERENCE STRUCTURES
9.1 Basic Chat Flow
User Request
β
βΌ
HTTP POST /api/chat {message: "..."}
β
βΌ
ChatController
β
βΌ
ChatClient.builder()
.defaultSystem("You are a helpful assistant")
.defaultAdvisors(advisor1, advisor2)
β
βΌ
Advisor Chain Processing (pre-call)
β Memory Advisor: Inject conversation history
β RAG Advisor: Retrieve and inject relevant context
β Safety Advisor: Check input for harmful content
β
βΌ
ChatModel.call(Prompt) β AI Provider API
β
βΌ
Advisor Chain Processing (post-call)
β Memory Advisor: Save new messages to memory
β Logging Advisor: Log request and response
β
βΌ
ChatResponse β Generation β Content
β
βΌ
HTTP Response {answer: "..."}9.2 RAG Pipeline Flow
INGESTION (Offline)
ββββββββββββββββββ
DocumentReader(s) β Extract raw text/metadata from source
β
βΌ
TextSplitter β Split into chunks (e.g., 512 tokens, 50 overlap)
β
βΌ
MetadataEnricher β Add source, date, department, etc.
β
βΌ
EmbeddingModel.embed() β Convert chunks to vectors
β
βΌ
VectorStore.add() β Persist embeddings
QUERY (Runtime)
βββββββββββββββ
User Query
β
βΌ
QueryTransformer β Rewrite/expand/translate query
β
βΌ
EmbeddingModel.embed() β Embed user query
β
βΌ
VectorStore.similaritySearch() β Top-K similar chunks
β
βΌ
DocumentReranker β Reorder by cross-encoder score
β
βΌ
ContextualQueryAugmenter β Inject context into prompt
β
βΌ
ChatModel.call() β Generate answer grounded in context
β
βΌ
Answer + Source Citations9.3 Agent ReAct Loop
User Task / Goal
β
βΌ
Agent.call()
β
βΌ
βββββββββββββββββββββββββββββββββββ
β ReAct Loop β
β β
β THINK: Analyze task β
β Select next action β
β ββ Terminate? β Answer β
β β β
β ACT: Call tool(s) β
β Execute action β
β β β
β OBSERVE: Process tool result β
β Update working memory β
β ββ Loop back to THINK β
βββββββββββββββββββββββββββββββββββ
β
βΌ
Final Answer9.4 MCP Architecture
Your Spring Boot App
β
βββ MCP Client βββββββββββββββββββΊ External MCP Server A (Filesystem)
β (stdio or HTTP/SSE transport)
βββ MCP Client βββββββββββββββββββΊ External MCP Server B (GitHub)
β
βββ MCP Server βββββββββββββββββββ Claude Desktop / Cursor / Other AI
(Exposes @Tool beans as MCP)9.5 Spring AI Module Dependencies Map
Core Modules
βββ spring-ai-core (interfaces, models, advisors framework)
βββ spring-ai-client-chat (ChatClient API)
βββ spring-ai-rag (RAG pipeline components)
Model Provider Modules
βββ spring-ai-openai
βββ spring-ai-anthropic
βββ spring-ai-azure-openai
βββ spring-ai-vertex-ai-gemini
βββ spring-ai-bedrock-converse
βββ spring-ai-ollama
βββ spring-ai-mistral-ai
Vector Store Modules
βββ spring-ai-pgvector-store
βββ spring-ai-redis-store
βββ spring-ai-chroma-store
βββ spring-ai-milvus-store
βββ spring-ai-pinecone-store
Document Reader Modules
βββ spring-ai-tika-document-reader
βββ spring-ai-pdf-document-reader
βββ spring-ai-jsoup-document-reader
MCP Modules
βββ spring-ai-starter-mcp-client
βββ spring-ai-starter-mcp-server
Spring Boot Auto-Configuration Modules
βββ spring-ai-spring-boot-autoconfigure
(Auto-configures all of the above via starters)10. LEARNING RESOURCES
Official Resources
- Spring AI Docs: https://docs.spring.io/spring-ai/reference/
- Spring AI GitHub: https://github.com/spring-projects/spring-ai
- Awesome Spring AI: https://github.com/spring-ai-community/awesome-spring-ai
- Spring Initializr: https://start.spring.io
- Spring AI Blog: https://spring.io/blog (filter by "Spring AI")
Books
- "Beginning Spring AI" β Andrew Lombardi & Joseph Ottinger (Apress, March 2025)
- Spring AI Reference Documentation (official, free, comprehensive)
Community
- Spring AI GitHub Discussions
- Spring Community Forum: https://community.spring.io
- Stack Overflow: [spring-ai] tag
- Spring AI YouTube Channel (SpringDeveloper)
Key Conference Talks (2025)
- "The State of the Art of Spring AI" β Josh Long, GOTO 2025
- "Modular RAG Architectures with Java and Spring AI" β Thomas Vitale, Spring I/O 2025
- "Code Smarter, Not Harder: AI-Powered Dev Hacks" β Dan Vega, Spring I/O 2025
- "LangChain4j vs Spring AI" β J-Spring 2025
π QUICK REFERENCE: SPRING AI DECISION MATRIX
| Requirement | Recommended Component |
|---|---|
| Simple Q&A chat | ChatClient + SystemPrompt |
| Multi-turn conversation | ChatClient + MessageChatMemoryAdvisor |
| Answer from documents | QuestionAnswerAdvisor + VectorStore |
| Complex document Q&A | Modular RAG pipeline |
| Call external APIs | @Tool annotated methods |
| Autonomous task execution | ReAct Agent |
| Connect to external tools | MCP Client |
| Expose app as AI tool | MCP Server |
| Extract structured data | BeanOutputConverter |
| Analyze images | Multimodal ChatModel + Media |
| Local models (no cloud) | Ollama + spring-ai-ollama-starter |
| Production vector DB | PGVector (if Postgres) or Pinecone |
| Conversation persistence | JdbcChatMemoryRepository |
| Token usage tracking | Micrometer + Prometheus |
| Test AI quality | RelevancyEvaluator + FactCheckingEvaluator |
| Enterprise secrets | Spring Cloud Vault + @Value |